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Abstract 

Data assimilation is an iterative approach to the problem of estimating the state of a dy- 
namical system using both current and past observations of the system together with a model 
for the system's time evolution. Rather than solving the problem from scratch each time new 
observations become available, one uses the model to "forecast" the current state, using a prior 
state estimate (which incorporates information from past data) as the initial condition, then uses 
current data to correct the prior forecast to a current state estimate. This Bayesian approach 
is most effective when the uncertainty in both the observations and in the state estimate, as it 
evolves over time, are accurately quantified. In this article, we describe a practical method for 
data assimilation in large, spatiotemporally chaotic systems. The method is a type of "ensemble 
Kalman filter", in which the state estimate and its approximate uncertainty are represented at 
any given time by an ensemble of system states. We discuss both the mathematical basis of 
this approach and its implementation; our primary emphasis is on ease of use and computa- 
tional speed rather than improving accuracy over previously published approaches to ensemble 
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Kalman filtering. We include some numerical results demonstrating the efficiency and accuracy 
of our implementation for assimilating real atmospheric data with the global forecast model 
used by the U.S. National Weather Service. 

1 Introduction 

Forecasting a physical system generally requires both a model for the time evolution of the system 
and an estimate of the current state of the system. In some applications, the state of the system 
can be measured directly with high accuracy. In other applications, such as weather forecasting, 
direct measurement of the global system state is not feasible. Instead, the state must be inferred 
from available data. While a reasonable state estimate based on current data may be possible, in 
general one can obtain a better estimate by using both current and past data. "Data assimilation" 
provides such an estimate on an ongoing basis, iteratively alternating between a forecast step and 
a state estimation step; the latter step is often called the "analysis". The analysis step combines 
information from current data and from a prior short-term forecast (which is based on past data), 
producing a current state estimate. This estimate is used to initialize the next short-term forecast, 
which is subsequently used in the next analysis, and so on. The data assimilation procedure is itself 
a dynamical system driven by the physical system, and the practical problem is to achieve good 
"synchronization" ||40)| between the two systems. 

Data assimilation is widely used to study and forecast geophysical systems [fT3ll28l . The analy- 
sis step is generally a statistical procedure (specifically, a Bayesian maximum likelihood estimate) 
involving a prior (or "background") estimate of the current state based on past data, and current data 
(or "observations") that are used to improve the state estimate. This procedure requires quantifi- 
cation of the uncertainty in both the background state and the observations. While quantifying the 
observation uncertainty can be a nontrivial problem, in this article we consider that problem to be 
solved, and instead concentrate on the problem of quantifying the background uncertainty. 

There are two main factors that create background uncertainty. One is the uncertainty in the 
initial conditions from the previous analysis, which produces the background state via a short-term 
forecast. The other is "model error", the unknown discrepancy between the model dynamics and 
actual system dynamics. Quantifying the uncertainty due to model error is a challenging problem, 
and while this problem generally cannot be ignored in practice, we discuss only crude ways of 
accounting for it in this article. For the time being, let us consider an idealized "perfect model" 
scenario, in which there is no model error. 

The main purpose of this article is to describe a practical framework for data assimilation that 
is both relatively easy to implement and computationally efficient, even for large, spatiotemporally 
chaotic systems. (By "spatiotemporally chaotic" we mean a spatially extended system that exhibits 
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temporally chaotic behavior with weak long-range spatial correlations.) The emphasis here is on 
methodology that scales well to high-dimensional systems and large numbers of observations, rather 
than on what would be optimal given unlimited computational resources. Ideally, one would keep 
track of a probability distribution of system states, propagating the distribution using the Fokker- 
Planck-Kolmogorov equation during the forecast step. While this approach provides a theoretical 
basis for the methods used in practice [|25l , it would be computationally expensive even for a low- 
dimensional system and is not at all feasible for a high-dimensional system. Instead one can use 
a Monte Carlo approach, using a large ensemble of system states to approximate the distribution 
(see [6] for an overview), or a parametric approach like the Kalman filter 112611271 . which assumes 
Gaussian distributions and tracks their mean and covariance. (The latter approach was derived 
originally for linear problems, but serves as a reasonable approximation for nonlinear problems 
when the uncertainties remain sufficiently small.) 

The methodology of this article is based on the Ensemble Kalman Filter EHHUl, wn i cn has 
elements of both approaches: it uses the Gaussian approximation and follows the time evolution of 
the mean and covariance by propagating an ensemble of states. The ensemble can be reasonably 
small relative to other Monte Carlo methods because it is used only to parametrize the distribution, 
not to sample it thoroughly. The ensemble should be large enough to approximately span the space 
of possible system states at a given time, because the analysis essentially determines which linear 
combination of the ensemble members forms the best estimate of the current state, given the current 
observations. 

Many variations on the Ensemble Kalman Filter have been published in the geophysical liter- 
ature, and this article draws ideas from a number of them [|Tl |2l [4l HTl l20l [2T1 |30l [36l [37l |45l [48ll. 
These articles in turn draw ideas both from earlier work on geophysical data assimilation and from 
the engineering and mathematics literature on nonlinear filtering. For the most part, we limit our 
citations to ensemble-based articles rather than attempt to trace all ideas to their original sources. 
We call the method described here a Local Ensemble Transform Kalman Filter (LETKF), because 
it is most closely related to the Local Ensemble Kalman Filter 11361 l37l and the Ensemble Trans- 
form Kalman Filter 01 . Indeed, it can produce analyses that are equivalent to the LEKF in a more 
efficient manner that is formally similar to the ETKF. While this article does not describe a fun- 
damentally new method for data assimilation, it proposes a significant refinement of previously 
published approaches that combines formal simplicity with the flexibility to adapt to a variety of 
applications. 

In Section 2, we start by posing a general problem about which trajectory of a dynamical system 
"best fits" a time series of data; this problem is solved exactly for linear problems by the Kalman 
filter and approximately for nonlinear problems by ensemble Kalman filters. Next we derive the 
Kalman filter equations as a guide for what follows. Then we discuss ensemble Kalman filters 
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in general and the issue of "localization", which is important for applications to spatiotemporally 
chaotic systems. Finally, we develop the basic LETKF equations, which provide a framework for 
data assimilation that allows a system-dependent localization strategy to be developed and tuned. 
We discuss also several options for "covariance inflation" to compensate for the effects of model 
error and the deficiencies due to small sample size and linear approximation that are inherent to 
ensemble Kalman filters. 

In Section 3, we give step-by-step instructions for efficient implementation of the approach de- 
veloped in Section 2 and discuss options for further improving computational speed in certain cases. 
Then in Section 4, we present a generalization that allows observations gathered at different times to 
be assimilated simultaneously in a natural way. In Section 5, we present preliminary results using a 
global atmospheric forecast model with real observations; these results compare favorably with the 
data assimilation method used by the National Weather Service, and demonstrate the feasibility of 
the LETKF algorithm for large models and data sets. Section 6 is a brief conclusion. The notation 
in this article is based largely on that proposed in [|24l . with some elements from 11371 . 



2 Mathematical Formulation 

Consider a system governed by the ordinary differential equation 

|=FM, (1) 

where x is an m-dimensional vector representing the state of the system at a given time. Suppose 
we are given a set of (noisy) observations of the system made at various times, and we want to 
determine which trajectory {x(f)} of ® "best" fits the observations. For any given t, this trajectory 
gives an estimate of the system state at time t. 

To formulate this problem mathematically, we need to define "best fit" in this context. Let us 
assume that the observations are the result of measuring quantities that depend on the system state 
in a known way, with Gaussian measurement errors. In other words, an observation at time t; is a 
triple (ypHjjKj), where y" is a vector of observed values, and Hj and R ; describe the relationship 
between y° and x(fy): 

y°j=HMtj))+£j, 

where Ej is a Gaussian random variable with mean and covariance matrix R,. Notice that we are 
assuming a perfect model here: the observations are based on a trajectory of (CD), and our problem 
is simply to infer which trajectory produced the observations. In a real application, the observations 
come from a trajectory of the physical system for which (CD) is only a model. So a more realistic (but 
more complicated) problem would be to determine a pseudo-trajectory of (CD), or a trajectory of an 
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associated stochastic differential equation, that best fits the observations. Formulating this problem 
mathematically then requires some assumptions about the size and nature of the model error. We 
use the perfect model problem as motivation and defer the consideration of model error until later. 

Given our assumptions about the observations, we can formulate a maximum likelihood estimate 
for the trajectory of CQ) that best fits the observations at times t\ < t2 < ■ ■ ■ < t n . The likelihood of a 
trajectory x(t) is proportional to 

nexp [-^[y o j-m<m T ^jVj-HMtj))^ . 

The most likely trajectory is the one that maximizes this expression, or equivalently minimizes the 
"cost function" 

J°({*(t)}) = ElyJ-ff/W^))]^ 1 ^-^/^))]. (2) 

Thus, the "most likely" trajectory is also the one that best fits the observations in a least square 
sense. 

Notice that © expresses the cost J° as a function of the trajectory {x(f )}. To minimize the cost, 
it is more convenient to write J° as a function of the system state at a particular time t. Let M tt i be 
the map that propagates a solution of (0Q) from time t to time f'Q Then 

= Ew-^c^W)]^ 1 ^-^^/*))] ( 3 ) 

expresses the cost in terms of the system state x at time t. Thus to estimate the state at time t, we 
attempt to minimize J". 

For a nonlinear model, there is no guarantee that a unique minimizer exists. And even if it does, 
evaluating J° is apt to be computationally expensive, and minimizing it may be impractical. But 
if both the model and the observation operators Hj are linear, the minimization is quite tractable, 
because J° is then quadratic. Furthermore, instead of performing the minimization from scratch at 
each successive time t n , one can compute the minimizer by an iterative method, namely the Kalman 
filter ||26ll27ll . which we now describe in the perfect model scenario. This method forms the basis 
for the approach we will use in the nonlinear scenario. 

2.1 Linear Scenario: the Kalman Filter 

In the linear scenario, we can write M tt /(x) = M / ? /x and Hj(x) = Hjx where M f-f / and Hy are 
matrices. Using the terminology from the introduction, we now describe how to perform a forecast 

the derivations that follow, we allow t' to be less than t , though in practice integrating (Q~|i backward in time may 
be problematic — for example, if (Q3 represents a discretization of a dissipative partial differential equation. Our use of 
M tt i for t' < t is entirely expository; the methodology we develop will not require backward integration of (HJ. 
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step from time t n -\ to time t n followed by an analysis step at time t n , in such a way that if we start 
with the most likely system state, in the sense described above, given the observations up to time 
t n -\, we end up with the most likely state given the observations up to time t n . The forecast step 
propagates the solution from time t n -\ to time t n , and the analysis step combines the information 
provided by the observations at time t n with the propagated information from the prior observations. 
This iterative approach requires that we keep track of not only the most likely state, but also its 
uncertainty, in the sense described below. (Of course, the fact that the Kalman filter computes the 
uncertainty in its state estimate may be viewed as a virtue.) 

Suppose the analysis at time f„_i has produced a state estimate x?_j and an associated covariance 
matrix P^_j. In probabilistic terms, x"_j and P"_j represent the mean and covariance of a Gaussian 
probability distribution that represents the relative likelihood of the possible system states given the 
observations from time t\ to t n -\. Algebraically, what we assume is that for some constant c, 

EW-H^^^lyJ-HyM^,^] = [x-xUfiK-ir'lx-K-J+c- (4) 

In other words, the analysis at time t n -\ has "completed the square" to express the part of the 
quadratic cost function Jf that depends on the observations up to that time as a single quadratic 
form plus a constant. The Kalman filter determines x" and P" such that an analogous equation holds 
at time t n . 

First we propagate the analysis state estimate x^_j and its covariance using the forecast 
model to produce a background state estimate x b n and covariance matrix for the next analysis: 

x b n = M tn _ ut X-i, (5) 

Pj = M^ lA PS_,Mj_ lA . (6) 

Under a linear model, a Gaussian distribution of states at one time propagates to a Gaussian distri- 
bution at any other time, and the equations above describe how the model propagates the mean and 
covariance of such a distribution. (Usually, the Kalman filter adds a constant matrix to the right side 
of © to represent additional uncertainty due to model error.) 

Next, we want to rewrite the cost function J° given by © in terms of the background state 
estimate and the observations at time t n . (This step is often formulated as applying Bayes' rule to 
the corresponding probability density functions.) In ©, x represents a hypothetical system state at 
time t n -\. In our expression for J° , we want x to represent instead a hypothetical system state at 
time t n , so we first replace x by M ttt)tn _ { x = M.^_ ltn x in ©. Then using © and © yields 

E^yy-H^.^^RTl^-H.-M^.x] = [x-x^rHx-x^+c. 

7=1 



6 



It follows that 



Jl (x) = [x - x b n ] T (P h n )- 1 [x - x»] + [y° - H n xY R^[y° - H„x] + c 



^b 



lr.,o 



(7) 



To complete the data assimilation cycle, we determine the state estimate x". and its covariance 



P" so that 



j°(x) = [x-r n ] T m- l [x-r n ]+ c ' 

for some constant c' . Equating the terms of degree 2 in x, we get 

-l 



■pa 
r n 



(P n ) 1 +H,^R„ i H„ 



T n -U 



Equating the terms of degree 1 , we get 



\* n) *n ' ll n Jn 



(8) 



(9) 



Notice that when the model state is observed directly, H„ is the identity matrix, and equation © 
expresses the analysis state estimate as a weighted average of the background state estimate and the 
observations, weighted according to the inverse covariance of each. 

Equations © and © can be written in many different but equivalent forms, and it will be useful 
later to rewrite both of them now. Using ® to eliminate (P*) _1 from © yields 



K = * b n+KKK L (y°n-K n x b n ). 



(10) 



The matrix P^H^R^ 1 is called the "Kalman gain". It multiplies the difference between the obser- 
vations at time t n and the values predicted by the background state estimate to yield the increment 
between the background and analysis state estimates. Next, multiplying ® on the right by (P*) ~ 1 P* 
and combining the inverses yields 



(I + P*H^R n ! H n ) 1 P*. 



(11) 



This expression provides a more efficient way than ® to compute P^, since it does not require 
inverting P,^. 



Initialization. The derivation above of the Kalman filter avoids the issue of how to initialize the 
iteration. To solve the best fit problem we originally posed, we should make no assumptions about 
the system state prior to the analysis at time t\. Formally we can regard the background covariance 
Pj to be infinite, and for n = 1 use © and © with (Pj) _1 = 0. This works if there are enough 
observations at time t\ to determine (aside from the measurement errors) the system state; that is, 
if Hi has rank equal to the number of model variables m. The analysis then determines x" in the 
appropriate least-square sense. However, if there are not enough observations, then the matrix to be 
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inverted in © does not have full rank. To avoid this difficulty, one can assume a prior background 
distribution at time t\, with reasonably large but finite. This adds a small quadratic term to the 
cost function being minimized, but with sufficient observations over time, the effect of this term on 
the analysis at time t n decreases in significance as n increases. 

2.2 Nonlinear Scenario: Ensemble Kalman Filtering 

Many approaches to data assimilation for nonlinear problems are based on the Kalman filter, or at 
least on minimizing a cost function similar to ©. At a minimum, a nonlinear model forces a change 
in the forecast equations (0) and ©, while nonlinear observation operators H n force a change in the 
analysis equations (flOl) and (fTTT) . The extended Kalman filter (see, for example, 11251 ) computes 

= ^/„_ 1 ,r„(x^_ 1 ) using the nonlinear model, but computes using the linearization M fn _ xfn of 
M tn l tn around x" j. The analysis then uses the linearization H„ of H n around x*. This approach is 
problematic for complex, high-dimensional models such as a global weather model for (at least) two 
reasons. First, it is not easy to linearize such a model. Second, when the number of model variables 
m is several million, computations involving the m x m covariance matrices are very expensive. 

Approaches used in operational weather forecasting generally eliminate, for pragmatic reasons, 
the time iteration of the Kalman filter. The U.S. National Weather Service performs data assimi- 
lation every 6 hours using the "3D-Var" method |[32~ll38l . in which the background covariance P* 
in © is replaced by a constant matrix B representing typical uncertainty in a 6-hour forecast. This 
simplification allows the analysis to be formulated in a manner that precomputes the most expensive 
matrix operations, so that they do not have to be repeated at each time t n . The 3D-Var cost func- 
tion also allows a nonlinear observation operator H n , and is minimized numerically to produce the 
analysis state estimate x". 

The "4D-Var" method I13T1 l42| used by the European Centre for Medium-Range Weather Fore- 
casts uses a cost function that includes a constant-covariance background term as in 3D-Var, together 
with a sum like © accounting for the observations collected over a 12-hour time window. Again 
the cost function is minimized numerically; this procedure is computationally intensive, because 
computing the gradient of the 4D-Var cost function requires integrating both the nonlinear model 
and its linearization over the 12-hour window, and this procedure is repeated until a satisfactory 
approximation to the minimum is found. 

The key idea of ensemble Kalman filtering [|71|9l is to choose at time t n -\ an ensemble of initial 
conditions whose spread around x?_j characterizes the analysis covariance Pf_i, propagate each 
ensemble member using the nonlinear model, and compute P* based on the resulting ensemble at 
time t n . Thus like the extended Kalman filter, the (approximate) uncertainty in the state estimate is 
propagated from one analysis to the next, unlike 3D-Var (which does not propagate the uncertainty 
at all) or 4D-Var (which propagates it only with the time window over which the cost function is 
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minimized). Furthermore, ensemble Kalman filters do this without requiring a linearized model. 
On the other hand, 4D-Var (with "weak constraint") allows a wide variety of model error terms to 
be incorporated into the cost function. 

In spite of their differences, though, we emphasize that in the absence of computational limi- 
tations, 4D-Var and ensemble Kalman filtering should be able to produce similar results because 
they both seek to minimize the same type of cost function. Indeed, in a perfect model scenario 
IfTTl . we obtained similar results with both methods when we used a sufficiently long time window 
for 4D-Var and when we used enough ensemble members and performed the analysis sufficiently 
frequently in a 4D version (described in Section|4]of this article) of our LETKF. In atmospheric data 
assimilation, ensemble Kalman filtering has not yet equaled the best results using 4D-Var, but it has 
begun to achieve results that compare favorably with operational 3D- Var results [|22l l46l l34l ; see 
also Section [51 

Perhaps the most important difference between ensemble Kalman filtering and the other methods 
described above is that the former quantifies uncertainty only in the space spanned by the ensemble. 
Assuming that computational resources restrict the number of ensemble members k to be much 
smaller than the number of model variables m, this can be a severe limitation. On the other hand, if 
this limitation can be overcome (see the section on "Localization" below), then the analysis can be 
performed in a much lower-dimensional space (k versus m). Thus, ensemble Kalman filtering has 
the potential to be more computationally efficient than the other methods. Indeed, the main point 
of this article is to describe how to do ensemble Kalman filtering efficiently without sacrificing 
accuracy. 

Notation. We start with an ensemble {x"_ j : i = 1, 2, . . . ,k} of m-dimensional model state vectors 
at time t n -\. One approach would be to let one of the ensemble members represent the best estimate 
of the system state, but here we assume the ensemble to be chosen so that its average represents 
the analysis state estimate. We evolve each ensemble member according to the nonlinear model to 
obtain a background ensemble {x*^ : i = 1 , 2, . . . , k} at time t n : 

_ M („«(*) \ 

For the rest of this article, we will discuss what to do at the analysis time t n , and so we now drop the 
subscript n. Thus, for example, H and R will represent respectively the observation operator and the 
observation error covariance matrix at the analysis time. Let I be the number of scalar observations 
used in the analysis. 

For the background state estimate and its covariance, we use the sample mean and covariance of 
the background ensemble: 

(=i 
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k 

V b = (fc-l)- 1 £(x^-x*)(x^-x fo ) r = (k-\y l X b (X h ) T 1 (12) 
(=1 

where X b is the m x k matrix whose zth column is x*W — x b . (Notice that the rank of P b is equal 
to the rank of X*, which is at most k — 1 because the sum of its columns is 0. Thus, the ensemble 
size limits the rank of the background covariance matrix.) The analysis must determine not only an 
state estimate x a and covariance P a , but also an ensemble {x a W : i = 1,2,..., A:} with the appropriate 
sample mean and covariance: 

k 

(=1 

k 

P a = (k-\)- 1 £(x fl W -x fl )(x fl W -x a ) T = (k- l)- 1 X a (X a ) r , (13) 

(=1 

where X a is the m x k matrix whose zth column isx a ^ — x a . 

In Section [2731 we will describe how to determine x a and P fl for a (possibly) nonlinear observa- 
tion operator H in a way that agrees with the Kalman filter equations (TTOb and (fTTI) in the case that 
H is linear. 

Choice of analysis ensemble. Once x a and P a are specified, there are still many possible choices 
of an analysis ensemble (or equivalently, a matrix X a that satisfies (fT3l and the sum of whose 
columns is zero). Many ensemble Kalman filters have been proposed, and one of the main differ- 
ences among them is how the analysis ensemble is chosen. The simplest approach is to apply the 
Kalman filter update (flOl) separately to each background ensemble member (rather than their mean) 
to get the corresponding analysis ensemble member. However, this results in an analysis ensemble 
whose sample covariance is smaller than the analysis covariance P a given by (fTT|) . unless the obser- 
vations are artificially perturbed so that each ensemble member is updated using different random 
realization of the perturbed observations B51|20l. Ensemble square-root filters ifTl l45l l4l l43l l36l l37l 
instead use more involved but deterministic algorithms to generate an analysis ensemble with the 
desired sample mean and covariance. As such, their analyses coincide exactly with the Kalman filter 
equations in the linear scenario of the previous section. We will use this deterministic approach in 
Section [23] 

Localization. Another important issue in ensemble Kalman filtering of spatiotemporally chaotic 
systems is spatial localization. If the ensemble has k members, then the background covariance 
matrix P b given by (fT2l) describes nonzero uncertainty only in the (at most) /c-dimensional subspace 
spanned by the ensemble, and a global analysis will allow adjustments to the system state only in 
this subspace. If the system is high-dimensionally unstable — more precisely, if it has more than 
k positive Lyapunov exponents — then forecast errors will grow in directions not accounted for by 
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the ensemble, and these errors will not be corrected by the analysis. On the other hand, in a suffi- 
ciently small local region, the system may behave like a low-dimensionally unstable system driven 
by the dynamics in neighboring regions; such behavior was observed for a global weather model 
in 113911351 . Performing the analysis locally requires the ensemble to represent uncertainty in only 
the local unstable space. By allowing the local analyses to choose different linear combinations of 
the ensemble members in different regions, the global analysis is not confined to the ^-dimensional 
ensemble space and instead explores a much higher-dimensional space [T2l |36l 1371 . Another ex- 
planation for the necessity of localization for spatiotemporally chaotic systems is that the limited 
sample size provided by an ensemble will produce spurious correlations between distant locations 
in the background covariance matrix P b ||20~1 [T71 . Unless they are suppressed, these spurious cor- 
relations will cause observations from one location to affect, in an essentially random manner, the 
analysis in locations an arbitrarily large distance away. If the system has a characteristic "correla- 
tion distance", then the analysis should ignore ensemble correlations over much larger distances. In 
addition to providing better results in many cases, localization allows the analysis to be done more 
efficiently as a parallel computation [130113611371 . 

Localization is generally done either explicitly, considering only the observations from a region 
surrounding the location of the analysis ll29l l20l l30l HJ |36l (37J, or implicitly, by multiplying the 
entries in by a distance-dependent function that decays to zero beyond a certain distance, so 
that observations do not affect the model state beyond that distance [)2TlfT7ll45l . We will follow the 
explicit approach here, doing a separate analysis for each spatial grid point of the model. (Our use of 
"grid point" assumes the model to be a discretization of a partial differential equation, or otherwise 
to be defined on a lattice, but the method is also applicable to systems with other geometries.) The 
choice of which observations to use for each grid point is up to the user of the method, and a good 
choice will depend both on the particular system being modeled and on the size of the ensemble 
(more ensemble members generally allow more distant observations to be used gainfully). It is 
important, however, that most of the observations used in the analysis at a particular grid point also 
be used in the analysis at neighboring grid points. This ensures that the analysis ensemble does not 
change suddenly from one grid point to the next. For an atmospheric model, a reasonable approach 
is to use observations within a cylinder of a given radius and height centered at the analysis grid 
point and to determine empirically which values of the radius and height work best. At its simplest, 
the method we describe gives all of the chosen observations equal influence on the analysis, but we 
will also describe how to make their influence decay gradually toward zero as their distance from 
the analysis location increases. 

Initial background ensemble. A common method for generating a background ensemble to use 
at the first analysis time is to run the model for a while and to select model states at different ran- 
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domly chosen times. The intent of this method is for the initial background ensemble to be sampled 
from a climatological distribution. This is a reasonable choice for the background distribution when 
no prior observations are available. 



2.3 LETKF: A Local Ensemble Transform Kalman Filter 

We now describe an efficient means of performing the analysis that transforms a background en- 
semble {x*W : i= 1,2, ...,&} into an appropriate analysis ensemble {x a ^ : i = 1,2, ...,&}, using 
the notation defined above. We assume that the number of ensemble members k is smaller than both 
the number of model variables m and the number of observations £§| even when localization has 
reduced the effective values of m and £ considerably compared to a global analysis. (In this section 
we assume that the choice of observations to use for the local analysis has been performed already, 
and consider y°, H, and R to be truncated to these observations; as such, correlations between errors 
in the chosen observations and errors in other observations are ignored.) Most of the analysis takes 
place in a ^-dimensional space, with as few operations as possible in the model and observation 
spaces. 

Formally, we want the analysis mean x a to minimize the Kalman filter cost function CD), modified 
to allow for a nonlinear observation operator H: 

J(x) = (x-x b ) T (P h )-\x-x b ) + [y -H(x)} T R- l [y°-H(x)}. (14) 

However, the m x m background covariance matrix = (k — l)~ l X b (X b ) T has rank at most k—l, 
and is therefore not invertible. Nonetheless, as a symmetric matrix, it is one-to-one on its col- 
umn space S, which is also the column space of X b , or in other words the space spanned by the 
background ensemble perturbations. So in this sense, (P*)- 1 is well-defined on S. Then J is also 
well-defined for x — x b in S, and the minimization can be carried out in this subspaceO As we have 
said, this reduced dimensionality is an advantage from the point of view of efficiency, though the 
restriction of the analysis mean to S is sure to be detrimental if k is too small. 

To perform the analysis on S, we must choose an appropriate coordinate system. A natural 
approach is to use the singular vectors of X* (the eigenvectors of P*) to form a basis for S lfTl l36ll37ll . 
Here we avoid this step by using instead the columns of X* to span S, as in 01. One conceptual 
difficulty in this approach is that the sum of these columns is zero, so they are necessarily linearly 



2 This assumption is only expository; our algorithm does not require an upper bound on k, but it is less efficient than 
doing the analysis in model space if m < k or in observation space if I < k. 

Considerably more general cost functions can be used, at the expense of having to perform the minimization 
numerically in the ensemble space S. Since this space is relatively low-dimensional, this approach is still feasible even 



for high-dimensional systems. In lfl8l we use a non-quadratic background term within the LETKF framework, and (48] 
uses a similar approach to allow a non-quadratic observation term. 



12 



dependent. We could assume the first k—1 columns to be independent and use them as a basis, but 
this assumption is unnecessary and clutters the resulting equations. Instead, we regard X b as a linear 
transformation from a fc-dimensional space S onto S, and perform the analysis in S. Let w denote 
a vector in 5; then X*w belongs to the space S spanned by the background ensemble perturbations, 
and x = x b + X b w is the corresponding model state. 

Notice that if w is a Gaussian random vector with mean and covariance (Jfc- then 
x = x* + X*w is Gaussian with mean x b and covariance = (k— l)~ 1 X b (X b ) T . This motivates the 
cost function 

7~( w ) = (k-l)w T w+[y°-H(x b + X b w)] T R- l [y°-H(x b + X h w)] (15) 

on S. In particular, we claim that if w fl minimizes /, then x a = x b + X b \v a minimizes the cost 
function J. Substituting the change of variables formula into ([141) and using (fT2l) yields the identity 

J(w) = (k-l)w T (l-(X b ) T [X b (X b ) T ]- l X b )w + J(x b + X b w). (16) 

The matrix I - (X b ) T [X b (X b ) T ]~ l X b is the orthogonal projection onto the null space N of X b . 
(Generally N will be one-dimensional, spanned by the vector (1,1,...,1), but it could be higher- 
dimensional.) Thus, the first term on the right side of (fT6l) depends only on the component of w in 
N, while the second term depends only on its component in the space orthogonal to N (which is in 
one-to-one correspondence with S under X*). Thus if w a minimizes J, then it must be orthogonal 
to N, and the corresponding vector x a minimizes J. 

A cost function equivalent to (fT5l) appears in [33]. More generally, implementations of 3D-Var 
and 4D-Var commonly use a preconditioning step that expresses the cost function in a form similar 
to CH). 

Nonlinear observations. The most accurate way to allow for a nonlinear observation operator H 
would be to numerically minimize / in the ^-dimensional space S, as in ll48l . If H is sufficiently 
nonlinear, then J could have multiple minima, but a numerical minimization using w = (corre- 
sponding to x = x b ) as an initial guess would still be a reasonable approach. Having determined w a 
in this manner, one would compute the analysis covariance P a in S from the second partial deriva- 
tives of / at w a , then use X* to transform the analysis results into the model space, as below. But 
to formulate the analysis more explicitly, we now linearize H about the background ensemble mean 
x*. Of course, if H is linear then we will find the minimum of J exactly. And if the spread of the 
background ensemble is not too large, the linearization should be a decent approximation, simi- 
lar to the approximation we have already made that a linear combination of background ensemble 
members is also a plausible background model state. 
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Since we only need to evaluate H in the ensemble space (or equivalently to evaluate H(x b + 
X'V) for w in S), the simplest way to linearize H is to apply it to each of the ensemble members 
x^W and interpolate. To this end, we define an ensemble y b W of background observation vectors by 

y M0 = #( x M0). (17) 

We define also their mean y b , and the £ x k matrix Y b whose zth column is y b W — y b . We then make 
the linear approximation 

H(x b + X b w)^y b + Y b w. (18) 

The same approximation is used in, for example, ETI . and is equivalent to the joint state-observation 
space method in (TJ. 

Analysis. The linear approximation we have just made yields the quadratic cost function 

7*( w ) = (k- \)w T w +[y° - f - Y b w} T R- l [y° - f - Y b w}. (19) 

This cost function is in the form of the Kalman filter cost function ©, using the background mean 
= and background covariance P b = (k— 1) I, with Y b playing the role of the observation 
operator. The analogues of the analysis equations (fTOT ) and (fTTT) are then 

w a = P a (Y b ) T R- l (y°-f), (20) 

P a = [(k-\)I+(Y b ) T R- l Y b }- 1 . (21) 
In model space, the analysis mean and covariance are then 

x fl = x fc + xV, (22) 

P" = X b P a (X b ) T . (23) 

To initialize the ensemble forecast that will produce the background for the next analysis, we must 
choose an analysis ensemble whose sample mean and covariance are equal to x a and P a . As men- 
tioned above, this amounts to choosing a matrix X" so that the sum of its columns is zero and (PT3T) 
holds. Then one can form the analysis ensemble by adding x a to each of the columns of X a . 

Symmetric square root. Our choice of analysis ensemble is described by X a = X b W a , where 

W a = [(k-\)P a ] 1/2 (24) 

and by the 1/2 power of a symmetric matrix we mean its symmetric square root. Then P a = 
(k— l) _1 W a (W a ) r , and (fT3l) follows from (|23l) . The use of the symmetric square root to deter- 
mine W a from P a (as compared to, for example, a Cholesky factorization, or the choice described 
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in flU), is important for two main reasons. First, as we will see below, it ensures that the sum of 
the columns of X a is zero, so that the analysis ensemble has the correct sample mean (this is also 
shown for the symmetric square root in PHI ). Second, it ensures that W a depends continuously on 
P a ; while this may be a desirable property in general, it is crucial in a local analysis scheme, so 
that neighboring grid points with slightly different matrices P a do not yield very different analysis 
ensembles. Another potentially desirable property of the symmetric square root is that it minimizes 
the (mean-square) distance between W a and the identity matrix, so that the analysis ensemble per- 
turbations are in this sense as close as possible to the background ensemble perturbations subject to 
the constraint on their sample covariance [36, 37 J. 

To see that the sum of the columns of X a is zero, we express this condition as X a v = 0, where v 
is a column vector of k ones: v = (l,l,...,l) r . Notice that by (|2T|) . v is an eigenvector of P a with 
eigenvalue (k— 1) : 

(P^-iy = [(k- 1)1+ (YYR-^Iv = (k- 1)V, 

because the sum of the columns of is zero. Then by (|24l) . v is also an eigenvector of W a with 
eigenvalue 1. Since the sum of the columns of X" is zero, X a \ = X b W a \ = X b \ = as desired. 

Finally, notice that we can form the analysis ensemble first in S by adding w a to each of the 
columns of W a ; let {w"^} be the columns of the resulting matrix. These "weight" vectors specify 
what linear combinations of the background ensemble perturbations to add to the background mean 
to obtain the analysis ensemble in model space: 

x «(0 =x b + X b w a ^. (25) 

Local implementation. Notice that once the background ensemble has been used to form y b 
and Y b , it is no longer needed in the analysis, except in (|25l) to translate the results from S to 
model space. This point is useful to keep in mind when implementing a local filter that computes 
a separate analysis for each model grid point. In principle, one should form a global background 
observation ensemble yj 7 ^ from the global background vectors, though in practice this can be done 
locally when the global observation operator uses local interpolation. After the background 
observation ensemble is formed, the analyses at different grid points are completely independent 
of each other and can be computed in parallel. The observations chosen for a given local analysis 
dictate which coordinates of yj^-j are used to form the local background observation ensemble y b ('\ 
and the analysis in S produces the local weight vectors {w fl ^}. Computing the analysis ensemble 
{x fl W} for the analysis grid point using (|25l) then requires only using the background model states 
at that grid point. 

As long as the sets of observations used for a pair of neighboring grid points overlap heavily, the 
local weight vectors {w a W} for the two grid points are similar. In a region over which the weight 



15 



vectors do not change much, the analysis ensemble members are approximately linear combinations 
of the background ensemble members, and thus should represent reasonably "physical" initial con- 
ditions for the forecast model. However, if the model requires of its initial conditions high-order 
smoothness and/or precise conformance to an conservation law, it may be necessary to post-process 
the analysis ensemble members to smooth them and/or project them onto the manifold determined 
by the conserved quantities before using them as initial conditions (this procedure is often called 
"balancing" in geophysical data assimilation). 

In other localization approaches ll2Tl[T7ll45l . the influence of an observation at a particular point 
on the analysis at a particular model grid point decays smoothly to zero as the distance between the 
two points increases. A similar effect can be achieved here by multiplying the entries in the inverse 
observation error covariance matrix R 1 by a factor that decays from one to zero as the distance of 
the observations from the analysis grid point increases. This "smoothed localization" corresponds 
to gradually increasing the uncertainty assigned to the observations until beyond a certain distance 
they have infinite uncertainty and therefore no influence on the analysis. 

Covariance inflation. In practice, an ensemble Kalman filter that adheres strictly to the Kalman 
filter equations (TTOb and (fTTT) may fail to synchronize with the "true" system trajectory that produces 
the observations. One reason for this is model error, but even with a perfect model, the filter tends 
to underestimate the uncertainty in its state estimate [|45l . Regardless of the cause, underestimating 
the uncertainty leads to overconfidence in the background state estimate, and, hence, the analysis 
underweights the observations. If the discrepancy becomes too large over time, the observations 
are essentially ignored by the analysis, and the dynamics of the data assimilation system become 
decoupled from the truth. 

Generally this tendency is countered by an ad hoc procedure (with at least one tunable pa- 
rameter) that inflates either the background covariance or the analysis covariance during each data 
assimilation cycle. (Doing this is analogous to adding a model error covariance term to the right side 
of ©, as is usually done in the Kalman filter.) One "hybrid" approach adds a multiple of the back- 
ground covariance matrix B from the 3D-Var method to the ensemble background covariance prior 
to the analysis ifToll . "Multiplicative inflation" [ElECTl instead multiplies the background covariance 
matrix (or equivalently, the perturbations of the background ensemble members from their mean) 
by a constant factor larger than one. "Additive inflation" adds a small multiple of the identity matrix 
to either the background covariance or the analysis covariance during each cycle Il36ll37l . Finally, 
if one chooses the analysis ensemble in such a way that each member has a corresponding member 
of the background ensemble, then one can inflate the analysis ensemble by "relaxation" toward the 
background ensemble: replacing each analysis perturbation from the mean by a weighted average 
of itself and the corresponding background perturbation ll47l . 
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Within the framework described in this article, the hybrid approach is not feasible because it re- 
quires the analysis to consider uncertainty outside the space spanned by the background ensemble. 
However, once the analysis ensemble is formed, one could develop a means of inflating it in direc- 
tions (derived from the 3D-Var background covariance matrix B or otherwise) outside the ensemble 
space so that uncertainty in these directions is reflected in the background ensemble at the next anal- 
ysis step. Additive inflation is feasible, but requires substantial additional computation to determine 
the adjustment necessary in the ^-dimensional space S that corresponds to adding a multiple of the 
identity matrix to the model space covariance or P a . Relaxation is simple to implement, and is 
most efficiently done in S by replacing W a with a weighted average of it and the identity matrix. 

Multiplicative inflation can be performed most easily on the analysis ensemble by multiplying 
W a by an appropriate factor (namely v /p, if one wants to multiply the analysis covariance by p). To 
perform multiplicative inflation on the background ensemble instead, one can multiply X b by such 
a factor, and adjust the background ensemble {x b ^} accordingly before applying the observation 
operator H to form the background observation ensemble {y b ®}. A more efficient approach, which 
is equivalent if H is linear, is simply to replace (k— 1)1 by (k— l)I/p in (f2T|) . since (k— 1)1 is the 
inverse of the background covariance matrix P h in the ^-dimensional space S. One can check that 
this has the same effect on the analysis mean x a and covariance P a as multiplying X* and by 
yfp. If p is close to one, this is a good approximation to inflating the background ensemble before 
applying the observation operator even when this operator is nonlinear. 

Multiplicative inflation of the background covariance can be thought of as applying a discount 
factor to the influence of past observations on the current analysis. Since this discount factor is 
applied during each analysis, the cumulative effect is that the influence of an observation on future 
analyses decays exponentially with time. The inflation factor determines the time scale over which 
observations have a significant influence on the analysis. Other methods of covariance inflation 
have a similar effect, causing observations from sufficiently far in the past essentially to be ignored. 
Thus, covariance inflation localizes the analysis in time. This effect is especially desirable in the 
presence of model error, because then the model can only reliably propagate information provided 
by the observations for a limited period of time. 

3 Efficient Computation of the Analysis 

Here is a step-by-step description of how to perform the analysis described in the previous section, 
designed for efficiency both in ease of implementation and in the amount of computation and mem- 
ory usage. Of course there are some trade-offs between these objectives, so in each step we first 
describe the simplest approach and then in some cases mention alternate approaches and possible 
gains in computational efficiency. We also give a rough accounting of the computational complexity 
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of each step, and at the end discuss the overall computational complexity. After that, we describe 
an approach that in some cases will produce a significantly faster analysis, at the expense of more 
memory usage and more difficult implementation, by reorganizing some of the steps. As before, we 
use "grid point" in this section to mean a spatial location in the forecast model, whether or not the 
model is actually based on a grid geometry; we use "array" to mean a vector or matrix. The use of 
"columns" and "rows" below is for exposition only; one should of course store arrays in whatever 
manner is most efficient for one's computing environment. 

The inputs to the analysis are a background ensemble of mu -dimensional model state vectors 
{ x j^ : i = 1,2, . . . , k}, a function from the m^j -dimensional model space to the -dimensional 
observation space, an £^ -dimensional vector yj\ of observations, and an x £^ observation error 
covariance matrix Rr g i . The subscript g here signifies that these inputs reflect the global model state 
and all available observations, from which a local subset should be chosen for each local analysis. 
How to choose which observations to use is entirely up to the user of this method, but a reasonable 
general approach is to choose those observations made within a certain distance of the grid point at 
which one is doing the local analysis and determine empirically which value of the cutoff distance 
produces the "best" results. If one deems localization to be unnecessary in a particular application, 
then one can ignore the distinction between local and global, and skip Steps [3] and |9] below. 

Steps [Hand [2] are essentially global operations, but may be done locally in a parallel implemen- 
tation. Steps [3]-[8] should be performed separately for each local analysis (generally this means for 
each grid point, but see the parenthetical comment at the end of Step©. Step |9] simply combines 
the results of the local analyses to form a global analysis ensemble {x^ }, which is the final output 
of the analysis. 

1. Apply to each xf^ to form the global background observation ensemble {yj^j }, and 
average the latter vectors to get the i^-dimensional column vector y?-,. Subtract this vector 

from each {y^j } to form the columns of the £^ x k matrix Yj^. (This subtraction can be 

done "in place", since the vectors {y^ } w& no longer needed.) This requires k applications 
of H, plus 2k£\ g ] (floating-point) operations. If H is an interpolation operator that requires 
only a few model variables to compute each observation variable, then the total number of 
operations for this step is proportional to k£u times the average number of model variables 
required to compute each scalar observation. 



2. Average the vectors { x rii } to S et the my^-dimensional vector xj^, and subtract this vector 



from each xjK to form the columns of the m^ xk matrix X? (Again the subtraction can be 



done "in place"; the vectors {xj^ } are no longer needed). This step requires a total of 2km^ 
tions. (If H is linear, one 
%} by applying H to x b [g] 



operations. (If H is linear, one can equivalently perform Step[2]before StepCQ and obtain yj^ 
and Yf , by applying H to if , and Xj\.) 
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3. This step selects the necessary data for a given grid point (whether it is better to form the 
local arrays described below explicitly or select them later as needed from the global arrays 
depends on one's implementation). Select the rows °f*\g\ and Xj^ corresponding to the given 
grid point, forming their local counterparts: the m-dimensional vector x b and the mxk matrix 
X b , which will be used in Step\8\ Likewise, select the rows °ffu-\ an d , corresponding to the 
observations chosen for the analysis at the given grid point, forming the l-dimensional vector 
y b and the £ x k matrix Y b . Select the corresponding rows of y? i and rows and columns of 
Rr g i to form the l-dimensional vector y° and the £x £ matrix R. (For a high-resolution model, 
it may be reasonable to use the same set of observations for multiple grid points, in which 
case one should select here the rows of Xj^ and xj^ corresponding to all of these grid points.) 

4. Compute the kx £ matrix C = (Y*) 7 ^ -1 . If desired, one can multiply entries of R 1 or C 
corresponding to a given observation by a factor less than one to decrease (or greater than one 
to increase) its influence on the analysis. (For example, one can use a multiplier that depends 
on distance from the analysis grid point to discount observations near the edge of the local 
region from which they are selected; this will smooth the spatial influence of observations, as 
described in Section [23] under "Local Implementation".) Since this is the only step in which R 
is used, it may be most efficient to compute C by solving the linear system RC r = Y b rather 
than inverting R. In some applications, R may be diagonal, but in others R will be block 
diagonal with each block representing a group of correlated observations. As long as the size 
of each block is relatively small, inverting R or solving the linear system above will not be 
computationally expensive. Furthermore, many or all of the blocks that make up R may be 
unchanged from one analysis time to the next, so that their inverses need not be recomputed 
each time. Based on these considerations, the number of operations required (at each grid 
point) for this step in a typical application should be proportional to k£, multiplied by a factor 
related to the typical block size of R. 

5. Compute the kxk matrix P a = Uk — l)I/p + CY^j , as in ([27]). Here p > 1 is a multiplica- 
tive covariance inflation factor, as described at the end of the previous section. Though trying 
some of the other approaches described there may be fruitful, a reasonable general approach 
is to start with p > 1 and increase it gradually until one finds a value that is optimal according 
to some measure of analysis quality. Multiplying C and Y* requires less than 2k 2 £ operations, 
while the number of operations needed to invert the kxk matrix is proportional to /c 3 . 

6. Compute the kx k matrix W = [(k— l)P a ] 1//2 , as in ft24\) . Again the number of operations 
required is proportional to k 3 ; it may be most efficient to compute the eigenvalues and eigen- 
vectors of [(k — l)I/p + CY b ~\ in the previous step and then use them to compute both P 7 and 
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7. Compute the k-dimensional vector Vf a = V a C{f-y b ), as in ri20l ), and add it to each column 
ofW a , forming a kx k matrix whose columns are the analysis vectors {w^ 1 -*}. Computing 
the formula for \v a from right-to-left, the total number of operations required for this step is 
less than 3k(£ + k). 

8. Multiply X b by each w a ^ and add x b to get the analysis ensemble members {x a ^} at the 
analysis grid point, as in ([25]). This requires 2k 2 m operations. 

9. After performing Steps \3^8\for each grid point, the outputs of Step\8\form the global analysis 
ensemble { x ui }• 

We now summarize the overall computational complexity of the algorithm described above. If 
p is the number local analyses performed (equal to the number of grid points in the most basic 
approach), then notice that pm = mu, while pi — qi\ g ], where £ is the average number of observa- 
tions used in a local analysis and q is the average number of local analyses in which a particular 
observation is used. If £ is large compared to k and m, then the most computationally expensive step 
is either Step \5\ requiring approximately 2k 2 p£ = 2k 2 q£^ operations over all the local analyses, 
or Step HI whose overall number of operations is proportional to kp£ = kq£^, but with a propor- 
tionality constant dependent on the correlation structure of Rr g i . In any case, as long as the typical 
number of correlated observations in a block of Ru remains constant, the overall computation time 
grows at most linearly with the total number £u of observations. It also grows at most linearly with 
the total number mr g i of model variables; if mu is large enough compared to £[g], then the most 
expensive step is Step[8l with 2k 2 m^ overall operations. The terms in the computation time that 
grow with the number of observations or number of model variables are at most quadratic in the 
number k of ensemble members. However, for a sufficiently large ensemble, the matrix operations 
in Steps |5] and [6] that take of order k 3 operations per local analysis, or k 3 p operations overall, will 
become significant. 

In Section |51 we present some numerical results for which we find the computation time indeed 
grows roughly quadratically with k, linearly with q, and sublinearly with £y g y 

Batch processing of observations. Some of the steps above have a g-fold redundancy, in that 
computations involving a given observation are repeated over an average of q different local analy- 
ses. For a general observation error covariance matrix Rr g i this redundancy may be unavoidable, but 
it can be avoided as described below if the global observations can be partitioned into local groups 
(or "batches") numbered 1,2,... that meet the following conditions. First, all of the observations in 
a given batch must be used in the exact same subset of the local analyses. Second, observations in 
different batches must have uncorrelated errors, so that each batch j corresponds to a block R, in 
a block diagonal decomposition of R[ g j . (These conditions can always met if R[ g ] is diagonal, by 
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making each batch consist of a single observation. However, as explained below, for efficiency one 
should make the batches as large as possible while still meeting the first condition.) Then at Step [3] 
instead of selecting (overlapping) submatrices of y?y Yj^, y?y and R[ g j, for each grid point, let y*, 
Yj, y", represent the rows corresponding to the observations in batch j, and do the following for 
each batch. Compute and store the k x k matrix C/Yy and the fc-dimensional vector Cy(y? — y*), 
where Cj = (Y b j) T Rj l as in StepUl (This can be done separately for each batch, in parallel, and 
the total number of operations is roughly 2k 2 £^ g y) Then do Steps [5]-[8] separately for each local 
analysis; when CY* and C(y° — y ) are required in Steps [5] and [7J compute them by summing the 
corresponding arrays CjY b j and Cj(j° — y*) over the batches j of observations that are used in the 
local analysis. To avoid redundant addition in these steps, batches that are used in exactly the same 
subset of the local analyses should be combined into a single batch. The total number of operations 
required by the summations over batches roughly k 2 ps, where s is the average number of batches 
used in each local analysis. Both this and the 2& 2 £r g ] operations described before are smaller than 
the roughly 2k 2 p£ = 2k 2 q£^ operations they combine to replace. 

This approach has similarities with the "sequential" approach of [EH and [|45l . in which ob- 
servations are divided into uncorrelated batches and a separate analysis is done for each batch; the 
analysis is done in the observation space whose dimension is the number of observations in a batch. 
However, in the sequential approach, the analysis ensemble for one batch of observations is used 
as the background ensemble for the next batch of observations. Since batches with disjoint local 
regions of influence can be analyzed separately, some parallelization is possible, though the LETKF 
approach described above is more easily distributed over a large number of processors. For a serial 
implementation, either approach may be faster depending on the application and the ensemble size. 

4 Asynchronous Observations: 4D-LETKF 

In theory, one can perform a new analysis each time new observations are made. In practice, this 
is a good approach if observations are made at regular and sufficiently infrequent time intervals. 
However, in many applications, such as weather forecasting, observations are much too frequent 
for this approach. Imagine, for example, a 6-hour interval between analyses, like at the National 
Weather Service. Since weather can change significantly over such a time interval, it is important 
to consider observations taken at intermediate times in a more sophisticated manner than to pretend 
that they occur at the analysis time (or to simply ignore them). Operational versions of 3D-Var and 
4D-Var (see Section [2T2T) do take into account the timing of the observations, and one of the primary 
strengths of 4D-Var is that it does so in a precise manner, by considering which forecast model 
trajectory best fits the observations over a given time interval (together with assumed background 
statistics at the start of this interval). 
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We have seen that the analysis step in an ensemble Kalman filter considers model states that 
are linear combinations of the background ensemble states at the analysis time, and compares these 
model states to observations taken at the analysis time. Similarly, we can consider approximate 
model trajectories that are linear combinations of the background ensemble trajectories over an 
interval of time, and compare these approximate trajectories with the observations taken over that 
time interval. Instead of asking which model trajectory best fits the observations, we ask which 
linear combination of the background ensemble trajectories best fits the observations. As before, this 
is relatively a low-dimensional optimization problem that is much more computationally tractable 
than the full nonlinear problem. 

This approach is similar to that of an ensemble Kalman smoother ifTOl [H, but over a much 
shorter time interval. As compared to a "filter", which estimates the state of a system at time 
t using observations made up to time t, a "smoother" estimates the system state at time t using 
observations made before and after time t. Over a long time interval, one must generally take a more 
sophisticated approach to smoothing than to simply consider linear combinations of an ensemble of 
trajectories generated over the entire interval, both because the trajectories may diverge enough that 
linear combinations of them will not approximate model trajectories, and because in the presence of 
model error there may be no model trajectory that fits the observations over the entire interval. Over 
a sufficiently short time interval however, the approximation of true system trajectories by linear 
combinations of model trajectories with similar initial conditions is quite reasonable. 

While this approach to assimilating asynchronous observations is suitable for any ensemble 
Kalman filter ll23l . it is particularly simple to implement in the LETKF framework. We call this 
extension 4D-LETKF; see |fT9l for an alternate derivation of this algorithm. 

To be more concrete, suppose that we have observations (T/,y£ ) taken at various times Xj since 
the previous analysis. Let H Tj be the observation operator for time Xj and let be the error 
co variance matrix for these observations. In Section I2.3L we mapped a vector w in the ^-dimensional 
space S into observation space using the formula y* + Y fo w, where the background observation 
mean y b and perturbation matrix Y b were formed by applying the observation operator H to the 
background ensemble at the analysis time. So now, for each time Xj, we apply H tj to the background 
ensemble at time Xj, calling the mean of the resulting vectors y b . and forming their differences from 
the mean into the matrix Y b T .. 

We now form a combined observation vector y° by concatenating (vertically) the (column) vec- 
tors y° t ., and similarly by vertical concatenation of the vectors y b . and matrices Y b . respectively, we 
form the combined background observation mean y b and perturbation matrix Y b . We form the cor- 
responding error covariance matrix R as a block diagonal matrix with blocks R Tj (this assumes that 
observations taken at different times have uncorrelated errors, though such correlations if present 
could be included in R). 
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Given this notation, we can then use the same analysis equations as in the previous sections, 
which are based on minimizing the cost function J* given by (fT9l) . (We could instead write down 
the appropriate analogue to (fT5l) and minimize the resulting nonlinear cost function J; this would 
be no harder in principle than in the case of synchronous observations.) Referring to Section [3l 
the only change is in Step [H which one should perform for each observation time T/ (using the 
background ensemble and observation operator for that time) and then concatenate the results as 
described above. Step [2] still only needs to be done at the analysis time, since its output is used only 
in Step [8] to form the analysis ensemble in model space. All of the intermediate steps work exactly 
the same, in terms of the output of StepQ] 

In practice, the model will be integrated with a discrete time step that in general will not coincide 
with the observation times Xj. One should either interpolate the background ensemble trajectories 
to the observation times, or simply round the observation times off to the nearest model integration 
time. In either case, one must either store the background ensemble trajectories until the analysis 
time, or perform Step Q] of Section [3] during the model integration and store its results. The latter 
approach will require less storage if the number of observations per model integration time step is 
less than the number of model variables. 

One can perform localization in the same manner as with synchronous observations, but it may 
be advantageous to take into account the timing of the observations when deciding which of them 
to use in a given local analysis. For example, due to spatial propagation in the model dynamics, 
one may wish to include earlier observations from a greater distance than later observations. On the 
other hand, earlier observations may be less useful than observations closer to the analysis time due 
to model error; it may help then to decrease the influence of the earlier observations as described in 
Step Hof Section |U 

5 Numerical Experiments with Real Atmospheric Observations 

We have implemented the 4D-LETKF algorithm, as described in Sections |3] and HI with the oper- 
ational Global Forecast System (GFS) model iTPfll of the U.S. National Centers for Environmental 
Prediction (NCEP). This model is used (currently at higher resolution than we describe below) for 
National Weather Service forecasts. Previously we have published results using this model with the 
LEKF algorithm of Il3~6ll3~7ll . in a perfect model scenario (with simulated observations) iflTTl . Using 
the same parameters for the LETKF algorithm, we have obtained results very similar to those in 
Ipm . which we do not repeat here; with LETKF, the data assimilation steps run 3 to 5 times as fast 
as with LEKF. 

Here, we present some preliminary results obtained using the 4D-LETKF algorithm with the 
same model and real atmospheric observations collected in January and February 2004, and com- 
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pare them with results from the NCEP Spectral Statistical Interpolation (SSI) [|T3Tl , a state-of-the-art 
implementation of 3D-Var. Further results will appear in a future publication. Our data set in- 
cludes all operationally assimilated observations except for satellite radiances and measurements 
of atmospheric humidity. Observations include vertical sounding profiles of temperature and wind 
by weather balloons, surface pressure observations by land and sea stations, temperature and wind 
reports by commercial aircraft, and wind vectors derived from satellite based observation of clouds. 

For all of the results below, we assimilate observations every 6 hours, and we use a model 
resolution of T62 (a 192 x 94 longitude-latitude grid) with 28 vertical levels, for a total of about 
500,000 points. In our 4D-LETKF implementation, for each grid point, we selected observations 
from within ahxhxv subset of the model grid, centered at the analysis grid point, with the vertical 
height v varying (depending on the vertical level) from 1 to 7 grid points as in pTfl. and the horizontal 
width h held constant for each experiment at either 5 or 7 grid points. The number of ensemble 
members k we use in each experiment is either 40 or 80. In all 4D-LETKF experiments we used 
a spatially-dependent multiplicative covariance inflation factor p, which we taper from 1.15 at the 
surface to 1.1 at the top of the model atmosphere in the Southern Hemisphere, and from 1.25 at the 
surface to 1 . 15 at the top in the Northern Hemisphere (between 30°S and 30°N latitudes, we linearly 
interpolate between these values). 

5.1 Analysis Quality 

In this section, we compare the analyses from our 4D-LETKF implementation, using k = 40 ensem- 
ble members and a 7 x 7 horizontal grid (h = 7) for each local region, and from the NCEP SSI, using 
the same model resolution and the same observational data set for its global analysis (we call this the 
"benchmark" SSI analysis). To estimate the analysis error for a given state variable, we compute 
the spatial RMS difference between its analysis and the operational high-resolution SSI analysis 
computed by NCEP (we call this the "verification" SSI analysis). While this verification technique 
favors the benchmark SSI analysis, which is obtained with the same data assimilation method, it can 
provide useful information in regions where the 4D-LETKF and benchmark SSI analyses exclude 
a large portion of the observations assimilated into the verifying SSI analysis. Such a region is the 
Southern Hemisphere, where satellite radiances are known to have a strong positive impact on the 
quality of the analysis. 

We initialize 4D-LETKF with a random ensemble of physically plausible global states at mid- 
night on 1 January 2004. Specifically, we take each initial ensemble member from an operational 
NCEP analysis at a randomly chosen time between 15 January and 31 March 2004. The 4D-LETKF 
analyses start to synchronize with the observations after a few days. To exclude from our compari- 
son the transient error due to initialization of 4D-LETKF, we average all estimated errors over the 
month of February 2004 only. 
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Figure 1 shows the estimated analysis error of each method for temperature in the Southern 
Hemisphere extratropics (20°S to 90°S latitudes) as a function of atmospheric pressure. The 4D- 
LETKF analysis is more accurate than the benchmark SSI at all except near the surface, where the 
two methods are quite similar in accuracy. The advantage of the 4D-LETKF analysis is especially 
large in the upper atmosphere, where observations are extremely sparse. Figure 2 makes the same 
comparison between the 48-hour forecasts generated from the respective analyses, again verified 
against the operational high-resolution SSI analysis at the appropriate time. We see that the 4D- 
LETKF forecasts are also more accurate than those from the background SSI analysis, especially in 
the upper atmosphere. 




RMS analysis temperature error (K) 



Figure 1: Vertical profile of the estimated analysis temperature error, in degrees Kelvin, for the 4D- 
LETKF (solid) and benchmark SSI (dashed) analyses in the Southern Hemisphere extratropics. The 
atmospheric height is indicated by pressure, in hectopascal. The estimate of the error is obtained by 
calculating the root-mean- square difference between each analysis and the verifying SSI analysis 
for latitudes between 20°S and 90°S and averaging over all the analysis times in February 2004. 

For wind in the Southern Hemisphere and temperature and wind in the Northern Hemisphere, 
the RMS analysis and forecast errors for the 4D-LETKF and benchmark SSI are more similar to 
each other, though in all cases the 4D-LETKF results are significantly better in the highest part of 
the atmosphere (0 hPa to 100 hPa). 

We emphasize that these results are obtained with modest tuning of the 4D-LETKF parameters, 
and we expect further significant improvements from a more thorough exploration of the algorithm's 
parameter space, as well as a more sophisticated approach to model error, such as the adaptive bias- 
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Figure 2: Estimated 48-hour forecast temperature error versus atmospheric pressure for the 4D- 
LETKF (solid) and benchmark SSI (dashed) methods in the Southern Hemisphere extratropics. 
The estimated forecast error is computed in the same way as the estimated analysis error shown in 
Figure 1. 

correction technique of Q. 

Qualitatively similar results with the same model and a similar data set are reported in 11461 , using 
both an alternate implementation of the 4D-LETKF algorithm (with a different covariance inflation 
approach) and a related method based on the Ensemble Square-Root Filter of ll4~5l . The latter method 
was slightly more accurate in the Northern Hemisphere and slightly less accurate in the Southern 
Hemisphere, and both methods were more accurate than the corresponding benchmark SSI, when 
verified both against the operational high-resolution SSI analysis and against observations. See 
also lf22l and [|34l for comparisons of ensemble Kalman filter results to those of other operational 
3D-Var methods, using forecast models from the Meteorological Service of Canada and the Japan 
Meteorological Agency, respectively; the latter article also implements the LETKF approach. 

5.2 Computational Speed 

In this section, we present and discuss timing results from several representative analyses of the 
4D-LETKF experiment using the GFS described above, with different numbers of observations. In 
addition, we vary the number of ensemble members (k = 40 or 80) and the horizontal width of 
the local region (h = 5 or 7 grid points in both latitude and longitude). Though we use a parallel 
implementation, we report in Table 1 below the total CPU time used on a Linux cluster of forty 3.2 
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GHz Intel Xeon processors. The actual run time is many times faster; with the larger local region 
(h = 7) the analysis takes about 6 minutes on our cluster with k = 40 ensemble members, and 18 
minutes with k = 80. Thus, the results shown in Figures 1 and 2 can be obtained in an operational 
setting that allots only a few minutes for each analysis. Furthermore, because the observations 
are very nonuniformly distributed spatially, we expect to be able to reduce the parallel run time 
considerably by balancing the load more evenly between processors. We will report details of our 
parallel implementation in a future publication. 

Table 1 shows the total CPU time in seconds for 4 different 4D-LETKF parameter sets at each 
of 4 different analysis times. Different numbers of observations are available for each analysis time, 
with about 50% more at 1200 GMT than at 0600 GMT. The computation time generally grows with 
the number of observations, though not by as large a factor. Referring to the discussion immediately 
following Steps Q] to [9] in Section [3l this indicates that the matrix multiplication portion of Step [5] 
that requires on the order of k 2 qi\^ total floating point operations is a significant component of the 
computation time, but that other parts of the computation are significant too. (Recall that £r g i is the 
global number of observations and q is the average number of analyses in which each observation is 
used, which in this implementation is roughly the average number of grid points per local region.) 
As h increases from 5 to 7, the value of q approximately doubles, and so does the computation 
time. And as k increases from 40 to 80, the computation time grows by a factor of 4 to 5, indicating 
that the time is roughly quadratic in k but suggesting that terms that are cubic in k are becoming 
significant. 



analysis time 


0600 GMT 


1800 GMT 


0000 GMT 


1200 GMT 


# observations 


159,947 


193,877 


236,168 


245,850 


k = 40, h = 5 


945 


945 


1244 


1142 


k = A0,h = 7 


1846 


2076 


2105 


2200 


k = S0,h = 5 


4465 


4453 


5124 


5010 


k = S0,h = 7 


9250 


10631 


10463 


10943 



Table 1: Total CPU time in seconds (on 3.2GHz Intel Xeon processors) for various analyses with 
4D-LETKF using the GFS with approximately 500,000 model grid points. Columns represent 
different analysis times, arranged in increasing order of the number of observations assimilated. 
Rows represent different values of the ensemble size k and horizontal localization width h. Notice 
that even on a single processor, all of these analyses can be done in less than real time. 

Indeed, examining the CPU time spent in various subroutines on different processors confirms 
that most of the time is spent in Steps [5] and [61 and that in local analyses where observations are 
dense, the matrix multiplication in Step [5] dominates the computation time, while in local analyses 



27 



where observations are sparse, the matrix inverse and square root in Steps [5] and [6] dominate. We find 
that the latter operations take more time in the analyses with the larger local region; this suggests 
that the iterate eigenvalue routine we use takes longer to run in cases when the presence of more 
observations causes P a to be further from a multiple of the identity matrix. There is also some 
computational overhead not accounted for in Section [3] whose contribution to the computation time 
is not negligible, in particular determining which observations to use for each local analysis. 

Overall, our timing results indicate that with a model and data set of this size, a substantially 
larger ensemble size than we currently use may be problematic, but that our implementation of 
4D-LETKF should be able to assimilate more observations with at most linear growth in the com- 
putation time. Furthermore, though we do not vary the number of model variables in Table 1, our 
examination of the time spent performing each of the steps from Section [3] suggests that we can 
increase the model resolution significantly without having much effect on the analysis computation 
time (though the time spent running the model would of course increase accordingly). 

6 Summary and Acknowledgments 

In this article, we have described a general framework for data assimilation in spatiotemporally 
chaotic systems using an ensemble Kalman filter that in its basic version (Section [3]) is relatively 
efficient and simple to implement. In a particular application, one may be able to improve accuracy 
by experimenting with different approaches to localization (see the discussion in Sections 12.21 and 
12.31) . covariance inflation (see the end of Section l2~3l) . and/or asynchronous observations (SectionlU). 
For very large systems and/or when maximum efficiency is important, one should consider carefully 
the comments about implementation in Section [3] (and at the end of Section HI if applicable). One 
can also apply this method to low-dimensional chaotic systems, without using localization. 

In Section [51 we presented preliminary results for a relatively straightforward implementation 
of the LETKF approach with real atmospheric data and an operational global forecast model. Our 
results demonstrate that this implementation can produce results of operational quality within a few 
minutes on a parallel computer of reasonable size. The efficiency of the basic algorithm provides 
many opportunities to improve the quality of the results with the variations discussed and referred 
to in this article. 
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